-
Notifications
You must be signed in to change notification settings - Fork 1.4k
zebra: handle cases where ifp ifindex isn't present #19804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This is what I would be doing. LGTM |
|
A problem: If the race condition happens, kernel_get_speed succeed, but ifp->ifindex is IFINDEX_INTERNAL in if_add_update, then it won't do speed change notification, and also won't reschedule the timer. |
|
thanks, yeah - it might make sense to take the change to zebra_ns.c also, just wanted to poke at the ifindex issues you described.
|
|
won't the RTM_NEWLINK cause the speed update to kick in again? |
Yes, RTM_NEWLINK will enter ADD code path, and call if_update_state_speed(ifp, kernel_get_speed(ifp, NULL)) and if_add_update. But does it start the timer to poll the speed? Or it is not needed? One question, is it necessary to call if_add_update in if_zebra_speed_update? Seems its only purpose is to call zebra_interface_add_update to notify other daemons? |
|
Ok how do you propose that upper level protocols get told of changes to interface state. That is the purpose of that function. |
Can we call zebra_interface_add_update instead of if_add_update in if_zebra_speed_update? Are there other functions useful for speed update in if_add_update that I missed? |
|
Instead of scheduling the 15-second timer in if_zebra_new_hook(), we can start it in zebra_if_dplane_ifp_handling() when we process RTM_NEWLINK, right after retrieving the initial link speed (i.e. after if_update_state_speed(ifp, kernel_get_speed(ifp, NULL))). This way, every interface will have its timer scheduled only once it has emitted a RTM_NEWLINK: With this approach, the timer should never run while ifp->ifindex is invalid, since it is only armed after processing a valid RTM_NEWLINK. Or am I missing something ? |
I considered this in #19792. But I just did a test, seems there is no RTM_NEWLINK event if we just restart zebra. |
|
This makes absolutely no sense. Zebra on startup is going to ask the kernel for the dump of the interfaces. This is going to send a RTM_NEWLINK for every interface in the system |
|
I didn't quite get this either. The NEWLINK would come when/after some device exists in the OS. it's true that that may arrive late, or never arrive - it's ok to have config that refers to interfaces that don't exist, or don't exist at the time the config is being processed. But zebra won't have an ifindex until it gets a that event, so that's why there are sort of a couple of possible orderings of events, as you've described.
|
I will double check, at least I can't see RTM_NEWLINK logs in interface.c |
Something wrong with my logs. There is a small period of time after startup, logs doesn't output to either syslog or zebra.log. But I can confirm that there are RTM_NEWLINK events after zebra restart. In RTM_NEWLINK ADD code path, if_update_state_speed(ifp, kernel_get_speed(ifp, NULL)) and if_add_update(ifp) already notifies daemons about new speed. |
|
@mjstapp -> With this change interfaces that are named in the config but never created, will cause the if_zebra_speed_update function to always run for those interfaces. This does fix the original reported problem as that I was able to see that upper level protocols have the right ifindex. This begs the question of how do we ensure that this is working in the future? |
|
I have do this commit to fix the issue: @markx-arista Could you please to give a try ? It should fix the issue, I didn't have the time to test it yet. |
|
hmm - I left that in place because that was the case already?
|
|
so to clarify. My original code does this for interfaces that do not have a ifindex yet: These stop after 4 minutes. So even after 4 minutes querying is not stopping. Additionally a new speed with a ifindex value set should not be stopping the querying. I've seen cases where newspeed didn't start even changing till after ~20 seconds. |
|
ok, I've been talking through the cases with Donald, and I think we've agreed on the logic that we want. we'll only trigger the speed querying from the device notification path (the NEWLINK path e.g.). we'll keep polling until a) we get a non-zero speed that is unchanged from the previous query, or b) we hit the polling limit. |
@mjstapp I agree with that. I implemented something similar here: maxime-leroy@d22e368 Right now, the timer that triggers the speed query is scheduled in two situations: Do you think we should keep these two separate behaviors, or simplify the logic? |
There was an odd commented-out line; remove it? Signed-off-by: Mark Stapp <[email protected]>
Handle cases where ifp does not have an ifindex (i.e. doesn't yet exist) when querying for interface speed, and when handling new interfaces. Only initiate the speed polling timer when we're notified about the device (via the OS) or when the device goes to UP state. Move some speed-query timer management to a helper function. Signed-off-by: Mark Stapp <[email protected]>
2f06139 to
6086568
Compare
|
I've pushed some changes along the lines we discussed above. I've left the speed-query polling in place in two paths: the interface device notification, and the UP status notification. |
|
I propose this: markx-arista@e5cb8fd
|
Right now, our differences are:
|
I think we should skip non-physical interfaces in the timer, we get 0 speed from them. |
|
Yeah no. Bonds/Lags are not real interfaces they have speed. Veth pairs have speeds. There are people working on making these speeds changeable so that more real world testing can be done. DO NOT limit changes like this to only physical interfaces you will break so much stuff it's not funny |
I see, thanks. Originally, the timer stops if the new and old speed are equal. I wonder if we need to keep polling if it is 0. |
See: fbc83b9 The only two cases where we should stop polling for interface speed are:
As explained here: There are no netlink notifications that inform Zebra of link speed changes. We only get carrier (UP/DOWN) notifications via rtnetlink. Support for LINKMODES_NTF was added in Linux 5.6: However, this notification is only sent when link modes are changed via ethtool SET operations (ETHTOOL_MSG_LINKMODES_SET, ETHTOOL_SLINKSETTINGS, or ETHTOOL_SSET). It is not triggered by PHY autonegotiation, meaning it does not cover spontaneous speed changes when a cable is plugged/unplugged or when the PHY renegotiates. Therefore, polling remains the only reliable mechanism for detecting link speed changes caused by autoneg. |
| if_update_state_speed(ifp, new_speed); | ||
| if_add_update(ifp); | ||
| changed = true; | ||
| if (error == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code does not re-schedule the timer on INTERFACE_SPEED_ERROR_READ. Your change makes it behave like INTERFACE_SPEED_ERROR_UNKNOWN, which is a behavior change — why is this needed?
If both errors are handled the same way, is there still any point in having two separate error codes?
This logic change should be in a separate commit, with a short explanation of the intent, to ensure we don’t break anything.
Anyway, it’s not clear why INTERFACE_SPEED_ERROR_READ was introduced in the original commit, and the commit message does not explain the intent:
51cb6ae
@pguibert6WIND could you give more information ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#19794 (comment)
We can get a lot of INTERFACE_SPEED_ERROR_READ after bootup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’re seeing many INTERFACE_SPEED_ERROR_READ errors because the timer runs for interfaces that don’t yet exist in the OS. By only starting the timers after receiving RTM_NEWLINK, we ensure the interface is present, so the ioctl call will succeed. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I think that you should look at is why are you starting up FRR on system boot before the underlying network is ready to go? What do you expect FRR to be doing when it has no interfaces to do any work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’re seeing many INTERFACE_SPEED_ERROR_READ errors because the timer runs for interfaces that don’t yet exist in the OS. By only starting the timers after receiving RTM_NEWLINK, we ensure the interface is present, so the ioctl call will succeed. Am I missing something?
You are right about that if we put code in RTM_NEWLINK path, so we don't need to change the error handling logic here.
I don't understand why 51cb6ae assigns INTERFACE_SPEED_ERROR_READ to vrf_socket error but only ENODEV errno for vrf_ioctl, and stop the polling immediately. For other errors from vrf_ioctl, if_zebra_speed_update just gets both 0 error and 0 speed, so we can't treat error == 0 as there is no error.
| /* Don't process if we don't have a valid device */ | ||
| if (ifp->ifindex == IFINDEX_INTERNAL) | ||
| return; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not happened anymore ? as if_zebra_speed_update is called in zebra_if_dplane_ifp_handling (when we received RTM_NEWLINK with a valid index).
| zebra_if->speed_update_count = 0; | ||
| event_add_timer(zrouter.master, if_zebra_speed_update, ifp, 15, | ||
| &zebra_if->speed_update); | ||
| event_ignore_late_timer(zebra_if->speed_update); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timer is no longer scheduled after 15 seconds, but after 5 seconds. Is there a reason for this change?
|
I just found that the BGP neighbor down or zebra assertion issue was actually caused by this commit from last year: d349877. It replaced if_link_per_ns with zebra_ns_link_ifp, and if_link_per_ns actually checks ifp->ifindex == IFINDEX_INTERNAL, but zebra_ns_link_ifp doesn't. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
Mark where are we with this one? |
|
still chugging away, I guess. I think my sense of the priorities was: first close the bug, then get through the change that moves the speed-reading to the dplane, and then come back through and conclude on the triggers for refreshing the speed.
|
Handle cases where ifp does not have an ifindex (i.e. doesn't yet exist) when querying for interface speed, and when handling new interfaces. If we're trying to query interface speed, continue trying even if we receive an error. Limit the processing of a new interface (via configuration e.g.) that doesn't yet have an ifindex.